-
-
Notifications
You must be signed in to change notification settings - Fork 728
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
collect json paths in indexing #2231
Conversation
0813b35
to
f0f121e
Compare
f0f121e
to
a5dacd0
Compare
a5dacd0
to
3628d6d
Compare
stacker/src/arena_hashmap.rs
Outdated
/// | ||
/// # Safety | ||
/// Any call to get or mutate_or_create after this call is undefined behavior. | ||
pub unsafe fn iter_mut_keys<F: Fn(&mut [u8])>(&mut self, cb: F) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to do the remap here?
COuld that be done in a safe not mut way upon serialization? (in serialize postings)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Why do we need to do the remap here?
We need the keys in lexicographical order, for the serialization.
I think I can change it to consume the hashmap and remove the unsafe
This PR surely saves memory, but i have no clue what impact on indexing throughput is. (Of course, the next step is to push the ids at a higher level to avoid building the json path construction, but you were right to not ship that in that PR.) |
I didn't measure a lot, but it seemed relatively unchanged. The extra look-ups seem to be set of by the less copying and faster hash calculation |
57214c7
to
fb11895
Compare
fb11895
to
24658cd
Compare
* collect json paths in indexing * remove unsafe iter_mut_keys
Partially implementing #2015 and preparation for #2215
Next step would be to remove the
JsonTermWriter
and use the same technique in fast field json paths